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Abstract —Detecting the transportation mode of a user is 
important for a wide range of applications. While a number 
of recent systems addressed the transportation mode detection 
problem using the ubiquitous mobile phones, these studies either 
leverage GPS, the inertial sensors, and/or multiple cell towers 
information. However, these different phone sensors have high 
energy consumption, limited to a small subset of phones (e.g. 
high-end phones or phones that support neighbouring cell tower 
information), cannot work in certain areas (e.g. inside tunnels 
for GPS), and/or work only from the user side. 

In this paper, we present a transportation mode detection 
system, MonoSense, that leverages the phone serving cell infor¬ 
mation only. The basic idea is that the phone speed can be 
correlated with features extracted from both the serving cell 
tower ID and the received signal strength from it. To achieve 
high detection accuracy with this limited Information, MonoSense 
leverages diversity along multiple axes to extract novel features. 
Specifically, MonoSense extracts features from both the time 
and frequency domain information available from the serving 
cell tower over different sliding widow sizes. More importantly, 
we show also that both the logarithmic and linear RSS scales 
can provide different information about the movement of a 
phone, further enriching the feature space and leading to higher 
accuracy. 

Evaluation of MonoSense using 135 hours of cellular traces 
covering 485 km and collected by four users using different 
Android phones shows that it can achieve an average precision 
and recall of 89.26% and 89.84% respectively in differentiating 
between the stationary, walking, and driving modes using only the 
serving cell tower information, highlighting MonoSense ability to 
enable a wide set of intelligent transportation applications. 

I. Introduction 

The inference of the human transportation mode (e.g. walk¬ 
ing, driving, etc) is important for a wide range of applications 
such as human behavior monitoring ||T|, road traffic estimation 
1^, evaluating transportation related measures and policies 
0-0, among others. During the last decade, there has been 
rapid growth in the sensing capabilities of commodity phones 
combined with their ease of programming and large market 
penetration rate. Therefore, a number of unobtrusive systems 
that leverage the phone different sensors have been proposed 
||6)-p^. Specifically, GPS-based systems |[^-Q th at depend 
on the GPS location or its alternatives P^-llSl, inertial- 
sensors (mainly the accelerometer) ||T0)-|[T^, or a combination 
of both can provide different accuracy for transportation mode 
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detection with different garrulities. However, these systems 
suffer from high energy consumption, do not work everywhere 
(e.g. GPS does not work in tunnels and urban canyons), work 
only on smart phones, work in special phone positions (e.g. 
outside pocket with clear line of sight to the satellites or in 
hand) and do not work from the cellular provider side. 

To address these issues, a number of systems have been 
recently introduced that leverage cellular network information 
only {multiple cell towers IDs and received signal strength 
(RSS) from them), e.g. |19|-|23|, to detect whether a user 
is still, walking, or in a motorized transport. Nevertheless, 
previous cellular data based systems require the information 
from the serving cell tower as well as all neighboring cell 
towers information. Given the fact that the majority of An¬ 
droid phones, which account for more than 80% of the smart 
phone market, only provide the serving cell tower information 
p4| , this sparks the need for new methods that can detect the 
mode of transportation accurately, with only a single cell tower 
information, from both the phone or cellular provider side, and 
work with any phone (including low end phones). 

In this paper, we propose MonoSense, a system that lever¬ 
ages the ID and RSS information from the serving cell 
tower only from any commodity cellular phone to differentiate 
between three human modes of transportation: Stationary, 
walking, and driving. The basic idea is that different modes 
of transportation can be mapped to different speeds. These 
speeds in turn, can be correlated with features extracted from 
the serving cell tower ID and RSS. To address the confusion 
among different transportation modes with the limited avail¬ 
able information, MonoSense draws on two main concepts: 

(a) a novel interesting observation about the input RSS and 

(b) features diversity. Eor the former, we show that both the 
linear and logarithmic scales of the RSS can provide different 
information about the movement pattern, leading to more 
accurate differentiating between the different transportation 
modes. Eor the latter, the diversity in the RSS spaces is 
combined with the diversity of features extracted from both 
the time and frequency domains as well as diversity of the 
window sizes the features are extracted from, expanding the 
space of available features and providing a better possibility 
for removing the ambiguity between classes. 

We evaluate MonoSense using real-world data collected by 
four persons using different Android phones with different 






MonoSense 


cellular operators over a period of eight months covering 135 
hours of cellular traces. Our results show that MonoSense 
can achieve an average precision and recall of 89.26% and 
89.84% respectively in differentiating between the stationary, 
walking, and driving modes using only the serving cell tower 
information 

In summary, our contribution in this paper is three-fold; 

• We provide the architecture and details of the MonoSense 
system that can provide accurate, ubiquitous, and energy- 
efficient transportation mode detection using the informa¬ 
tion from the serving cell tower only. 

• We show that the RSS linear scale can provide inde¬ 
pendent information about the RSS in addition to the 
RSS logarithmic scale, which is the only scale used in 
traditional cellular-based transportation mode detection 
systems. This extends the features pool and helps in han¬ 
dling the limited information available for MonoSense. 

• We implement and evaluate MonoSense in typical envi¬ 
ronments with different phones, operators, and users. 


The remainder of this paper is organized as follows. Sec¬ 
tion |I^ presents the MonoSense system architecture as well as 
details of the different system components. We evaluate the 


system in Section III and conclude in Section IV 


II. The Mo«o5enie System 


In this section, we start by providing an overview of 
MonoSense architecture and the principal of operation fol¬ 
lowed by the details of the different system components, 
mainly: preprocessing, features extraction, differences between 
the RSS logarithmic and linear scales, and the classifier used. 
We end the section by a discussion of different aspects of 
MonoSense. 


A. Overview 

Figure [T] shows the system architecture. The only informa¬ 
tion available for MonoSense to detect the mode of trans¬ 
portation is the serving cell ID and associated RSS. The 
basic idea that MonoSense builds on is that some features 
extracted from these two sources of data can be correlated 
with the user speed. For example, a phone in a fast moving 
car would encounter faster changes in the RSS and see more 
changes in the associated cell towers compared to a stationary 
phone (within the same time duration). This maps to a higher 
difference between adjacent RSS readings, higher variance 
in RSS, and higher handoff (changes in the serving cell 
tower ID) frequency as shown in Figure However, due to 
the limited information available from the serving cell tower 
only, MonoSense diversifies its features pool to increase the 
classification accuracy including using features from the linear 
and logarithmic RSS space, in time and frequency domains, 
and over different RSS window sizes. 

The system starts by collecting a stream of serving cell 
tower information (d = d^^di,...), where each di is an 
ordered pair (IDi,RSSi) representing the ID of the serving 
cell tower and the associated received signal strength (RSS) in 
logarithmic scale (this is the default scale used by the Android 



Fig. 1: MonoSense system architecture. 


API) at sample i. These samples can either be collected from 
the cell phone or the cellular provider side. For the cell phone 
case, since this information is available from almost all phones 
and during the normal phone operation, MonoSense consumes 
zero extra energy, making it a ubiquitous and energy-efficient 
solution. 

The input data stream is then pre-processed to filter noisy 
data and then the RSS is mapped from the logarithmic to the 
linear scale. It then passed to a feature extractor to extract 
different features in the time and frequency domain over 
different sliding window sizes to enrich the feature space. 

A decision tree classifier is then applied to differentiate 
between the three modes of transportation, corresponding to 
three different ranges of speed: stationary (speed almost zero 
km/h), walking (maximum speed about 3-6 km/h, and in 
vehicle (free running speeds from 40-100 km/h). Note that 
the different modes may overlap in some cases. For example, 
in traffic congestion, the car speed can approach the user 
walking speed. These cases may reduce the accuracy of the 
classifier. Nevertheless, the different features help in reducing 


the ambiguity in this case as we quantify in Section III 


B. Preprocessing 

The goal of this module is to reduce the noise in the 
input data, mainly due to the ping-pong effect. Specifically, 
due to the noisy nature of the wireless propagation and the 
unpredicted load on the cell tower, the user serving cell tower 
can change back and forth between different nearby serving 
cells. This is called the ping-pong phenomenon | [25] , | [26) . To 
handle this phenomenon, we apply a smoothing filter, where 
a low number of samples from a certain cell tower between 
two groups of samples from another dominant cell tower are 
replaced by the dominant cell ID. 


C. Time and Frequency Domain Features 

All features in MonoSense are calculated within a non¬ 
overlapping sliding window with a fixed size. Different par¬ 
allel window sizes are used to capture different granularities. 
We combine the following features from both the time and 
frequency domain to enrich its features set: 

Time domain features: 

These features are extracted from the time domain for 
different window sizes covering changes in both the cell ID 
and RSS values. 

1) Number of unique serving cell IDs within a time win¬ 
dow: For a fixed window size, the higher this number. 
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Fig. 2: Effect of the speed on the different time domain features. All features extracted within a certain window size. 



Fig. 3; Effect of phone speed on the signal energy. 


the higher the speed (Figure [2a|. This is directly related 
to the handoff frequency. 

2) Average cell residence time; Which is the average dwell 
time spent in each serving cell. The higher this number, 
the lower the phone speed (Figure [2b| . 

3) RSS variance: Refers to the variance of the signal 
strength within the feature extraction window. This is 
based on the fact that higher speeds lead to faster 
changes and a noisier signal in the RSS and hence higher 
variance (Figure g. 

4) Average RSS difference between consecutive measure¬ 
ments: This is similar to the previous feature. In the 
extreme case, when the phone in stationary, one should 
expect that consecutive readings will almost be identical, 
which is complectly different from the high speed case 
(Figure [2c| . 

Frequency domain features: These features are extracted 
from the frequency domain using the fast Fourier transform 
algorithm (FFT) over different window sizes of RSS values. 

1) Frequency with the highest energy: The intuition is that 
the movement speed affects the rate of RSS change. 
This can be captured by the dominant frequency after 
removing the DC component. 

2) Signal energy: Defined as the sum of the square am¬ 
plitudes of the FFT spectrum, which reflects the energy 
in the RSS signal. The slower the speed, the lower the 
energy in the signal should be. (Figure |^. 


D. Linear versus Logarithmic RSS Scale 

Typically, RSS is measured in the logarithmic scale. For 
example, using the free space loss model, the received signal 
strength in logarithmic scale (pr(dB)) at a distance d from the 
transmitter is given by: 

Pr-(dB) = po - 10 * a * log (1) 

where pg is the power in dB at a reference distance dg from the 
transmitter and a is the path loss exponent. Therefore, based 
on the RSS logarithmic scale, equal changes in the physical 
distance on the roads from the transmitter (i.e. serving cell 
tower) lead to equal distances in the RSS log space. However, 
this is not the case in the linear RSS scale, where equal road 
distances from the transmitter map to different distances in 
the linear RSS space. Based on this observation, MonoSense 
leverages both the logarithmic and linear RSS readings to 
extract the classification features. This scale diversity reduces 
the ambiguity between classes. 


E. Transportation Mode Classifier 

We use a decision tree classifier for differentiating between 
the three different modes of transportation due to its simplicity 
and efficient implementation . We use a total of 36 features 
(six main features from Section II-C repeated for both linear 
and logarithmic scales for three different window sizes). 


F. Discussion 

MonoSense leverages features from different window sizes 
to enrich its feature space and hence obtain higher accuracy. 
As shown in this section, longer windows usually lead to better 
differentiation between the different speeds due to the more 
available information. However, longer windows increase the 
latency of estimation and extra large windows (not shown in 
this section) can worsen the performance as they may span 
different speeds within the same window. 

Moreover, smaller window sizes can provide better differ¬ 
entiation in some features. For example, shorter window sizes 
can differentiate better between the stationary and walking 
modes for the cell resident time feature (Figure [2b] i. 

To balance these factors, we use three window sizes in our 
implementation: 10, 30, and 60 seconds. 
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(a) Using RSS logarithmic scale 
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(b) Using RSS linear scale 
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(c) Overall system (all features) 


TABLE I; Precision using features extracted from the RSS logarithmic scale, RSS linear scale, and both (overall system). 
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(a) Using RSS logarithmic scale 
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(b) Using RSS linear scale 
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(c) Overall system (all features) 


TABLE II: Recall using features extracted from the RSS logarithmic, RSS linear scale, and both (overall system). 


III. Evaluation 

In this section, we evaluate MonoSense in different real 
testbeds.We start by describing the experimental setup fol¬ 
lowed by the performance analysis. 

A. Experimental Setup 

We implemented a data collector on the Android Android 
SDK 2.3.3 (API Level 10). The collector gathers the serving 
cell tower ID, RSS, and time stamp periodically at a rate 
of one sample per second. The GPS location and speed 
are also collected as ground truth. The collector enables the 
user to add custom annotations, for example to manually 
enter the current ground truth transportation mode for traces 
that span multiple modes of transportation. We deployed our 
collector on different Android phones including a Samsung 
Nexus S, Nexus One, Galaxy S 7562, and Galaxy Tab 1000. 
The collected traces were analyzed offline, were the features 
extraction and classifier were implemented using Matlab. 

using 135 hours of cellular traces collected by four users 
using different android 

We collected a total of 135 hours dataset over the course of 
eight months from January to August 2014 uniformly covering 
different modes of transportation. The combined traces length 
account for 485 km. Pour different users were involved in the 
data collection process from the three operators in Egypt. 

B. Performance Metrics 

We use five-fold cross validation to evaluate our classifier. 
We use two metrics for evaluation: precision and recall dehned 
as: 

T~i • • True positives Correctly detected 

• r'rpoisinn= - - -= - - - 

True positives + False positives Total detected ' 

Precision captures the accuracy of the detected modes 

(percentage of detected modes that are correct). 

Recall" True positives _ Correctly detected 

True positives + False negatives Total actual events 


High precision indicates that the classifier returns more correct 
results than incorrect ones. On the other hand, high recall 
means that the classiher detects most of the correct modes 
from the ground truth. 

We use a confusion matrix between the different modes of 
transportation for each of the metrics. The values along the 
diagonal indicate the classiher performance. The off-diagonal 
elements quantify the confusion classes when error occurs. 

C. Performance Analysis 

In this section, we analyze the performance of the classiher 
for the different RSS scales using the different metrics. We 
start by showing the performance for the logarithmic and linear 
scales independently, followed by the combined classiher 
performance using all features (representing our overall system 
performance). 

1) Features extracted from the log scale: Table |IIn](a) 
shows the confusion matrices for the precision and recall 
respectively. The table shows that the diversity of features from 
the time and frequency domain as well as features extracted 
from the different window sizes lead to high classiheation 
accuracy. The stationary mode is the easiest to detect. Most of 
the classiheation errors are between the walking and driving 
classes. This can be explained by noting that low driving 
speeds, e.g. due to traffic congestion, lead to speeds that 
are comparable to the walking speeds. The driving mode of 
transportation is the hardest due to the wide span of speeds it 
covers compared to the other modes. Nevertheless, the average 
precision and recall are 85.13% and 85% respectively. These 
are further enhanced by combining the logarithmic and linear 
features as we quantify in Section [Ill-C3| 

2) Features extracted from the linear scale: Table min 
(b) shows the confusion matrices for the precision and recall 
respectively based on the features extracted from the RSS 






















































































linear scale. The table shows similar results to logarithmic 
scale results. Linear scale features slightly perform better for 
the stationary and walking classes while the logarithmic scale 
features perform better in the driving class. This validates our 
observation that the features are independent and can be fused 
together to provide better performance as in the next section. 

3) Overall system performance: Combined log and linear 
space features: Table HBc) shows the confusion matrices 
using all the 36 combined features. The table shows that 
using all features lead to an overall average precision and 
recall of 89.26% and 89.84% respectively. This highlights 
that MonoSense can achieve its goals of ubiquitous, high 
accuracy, and energy-efficient transportation mode detection 
using minimal information from the serving cell tower only. 

IV. Conclusion 

We presented a ubiquitous, energy-efficient, and accurate 
mode of transportation detection system, MonoSense, that 
completely relies on the information from the serving cell 
tower only. We showed how MonoSense leverages the diversity 
of features in the logarithmic and linear scales, time and 
frequency domain, as well as different window sizes to extend 
its feature space and achieve high accuracy. 

Real world implementation and experiments using different 
Android phones spanning 135 hours and 485 km over an eight 
months period confirm the effectiveness of MonoSense show¬ 
ing an average precision and recall of 89.26% and 89.84% 
respectively, highlighting MonoSense ability meet its goals of 
high accuracy, energy efficiency, and ubiquitous deployment 
on different phone types as well as on the user and cellular 
provider sides. 

Currently, we are expanding MonoSense in different di¬ 
rections including leveraging more features, classifying more 
modes of transportation, implementation on other phone op¬ 
erating systems, among others. 
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